In narrative texts punctuation marks obey the same statistics as words
نویسندگان
چکیده
From a grammar point of view, the role of punctuation marks in a sentence is formally defined and well understood. In semantic analysis punctuation plays also a crucial role as a method of avoiding ambiguity of the meaning. A different situation can be observed in the statistical analyses of language samples, where the decision on whether the punctuation marks should be considered or should be neglected is seen rather as arbitrary and at present it belongs to a researcher’s preference. An objective of this work is to shed some light onto this problem by providing us with an answer to the question whether the punctuation marks may be treated as ordinary words and whether they should be included in any analysis of the word cooccurences. We already know from our previous study (S. Drożdż et al., Inf. Sci. 331 (2016) 32-44) that full stops that determine the length of sentences are the main carrier of long-range correlations. Now we extend that study and analyze statistical properties of the most common punctuation marks in a few Indo-European languages, investigate their frequencies, and locate them accordingly in the Zipf rank-frequency plots as well as study their role in the word-adjacency networks. We show that, from a statistical viewpoint, the punctuation marks reveal properties that are qualitatively similar to the properties of the most frequent words like articles, conjunctions, pronouns, and prepositions. This refers to both the Zipfian analysis and the network analysis. By adding the punctuation marks to the Zipf plots, we also show ∗Corresponding author: [email protected] Preprint submitted to Information Sciences September 29, 2016 ar X iv :1 60 4. 00 83 4v 2 [ cs .C L ] 2 7 Se p 20 16 that these plots that are normally described by the Zipf-Mandelbrot distribution largely restore the power-law Zipfian behaviour for the most frequent items. Our results indicate that the punctuation marks can fruitfully be considered in the linguistic studies as their inclusion effectively extends dimensionality of an analysis and, therefore, it opens more space for possible manifestation of some previously unobserved effects.
منابع مشابه
An Automatic Punctuation Marks System For Arabic Texts
This work presents a system for Automatic Arabic punctuation marks. Existing approaches for automatic punctuation marks do not provide suitable performance for and do not satisfy user interests in Arabic texts. The importance and rising need to automate the correct insertion of punctuation marks in Arabic texts led to a need of specific analysis of the Arabic language to introduce approaches th...
متن کاملDiscursive Usage of Six Chinese Punctuation Marks
Both rhetorical structure and punctuation have been helpful in discourse processing. Based on a corpus annotation project, this paper reports the discursive usage of 6 Chinese punctuation marks in news commentary texts: Colon, Dash, Ellipsis, Exclamation Mark, Question Mark, and Semicolon. The rhetorical patterns of these marks are compared against patterns around cue phrases in general. Result...
متن کاملPii: S0306-4573(01)00044-9
Most work in NLP requires that texts have been previously segmented into sentences and words. Segmenting a text into sentences and words, however, is a complex task, due to the ambiguity of many punctuation marks and spaces. Furthermore, Web texts such as HTML documents are more difficult to make into well refined and segmented texts because they are described in a more free style, with many se...
متن کاملManagement of cohesion in the written productions of monolingual Persian-speaking students with specific language disorder
Introduction: Students with specific language impairment (SLI) have many difficulties in producing coherent written texts The goal of this study was to investigate and compare the management of cohesion in the written production of individuals with SLI and their normal peers in terms of density and diversity of connectives, the density of punctuation marks (periods and commas) and density and d...
متن کاملQualitative and Quantitative Examination of Text Type Readabilities: A Comparative Analysis
This study compared 2 main approaches to readability assessment. Thequantitative approach applied idea density based on part of speech tagging andcompared 3 sets of text types (i.e., narrative, expository, and argumentative) withrespect to their ease of reading. The qualitative approach was done throughdeveloping questionnaires measuring intermediate EFL learners’ perceptions oncontent, motivat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Sci.
دوره 375 شماره
صفحات -
تاریخ انتشار 2017